Y

YouLibs

Remove Touch Overlay

Fusing Economic Survey Datasets With the Synthimpute Python Package | PyData Global 2021

Duration: 26:16Views: 263Likes: 5Date Created: Jan, 2022

Channel: PyData

Tags: python learn to code education software pydata learn coding how to program julia opensource scientific programming numfocus python 3 tutorial

Description: Fusing Economic Survey Datasets With the Synthimpute Python Package Speaker: Max Ghenis Summary Economists often fuse or impute features between survey datasets to simulate economic policies. I describe common methods such as regression and matching, and show how the synthimpute package's random-forest-based method outperforms these in holdout sets. I also give examples of how PolicyEngine uses synthimpute to research a range of universal basic income policies. Description When economists simulate policy reforms, they turn to representative survey microdata. For example, policy reforms in the United States are often evaluated by simulating the impact on each of the 60,000 households that respond to the Current Population Survey. But the Current Population Survey doesn't include all household information that might be relevant to a policy; for example, it doesn't include wealth or carbon emissions. Simulating wealth taxes or carbon taxes using the Current Population Survey requires imputing wealth from the Survey of Consumer Finances, or carbon emissions from the Consumer Expenditure Survey. Typical methods for this include regression (sometimes a mix of logistic and linear regression), and statistical matching. However, these methods tend to underestimate the predictive uncertainty, which can lead to too few extreme values. They also don't fully account for interactions in predictors common to each dataset, and they can't be easily adjusted for systematic under- or over-reporting of the predicted quantity in the survey. I introduce the synthimpute Python package, and its methods for data fusion based on the random forest model. I show that this method outperforms current models for common imputation tasks using quantile loss assessments. I also demonstrate its capabilities to optimize deviations from uniform quantile selection to ensure that imputations sum to administrative targets. I contextualize the synthimpute technology with examples from PolicyEngine, a tech nonprofit that lets anyone reform the tax and benefit system and see the impact on society and their own household. Max Ghenis's Bio I'm the founder and president of the UBI Center, a think tank that researches universal basic income policies. Previously, I worked at Google as a data scientist. I have a B.A. in Operations Research from UC Berkeley and a M.S. in Data, Economics, and Development Policy from MIT. GitHub: github.com/MaxGhenis Twitter: twitter.com/MaxGhenis LinkedIn: linkedin.com/in/MaxGhenis Website: maxghenis.com/home.html PyData Global 2021 Website: pydata.org/global2021 LinkedIn: linkedin.com/company/pydata-global Twitter: twitter.com/PyData pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: github.com/numfocus/YouTubeVideoTimestamps